Methods for increasing accuracy of nucleic acid sequencing

ABSTRACT

The invention provides methods for improving the accuracy of a sequencing-by-synthesis reaction by sequencing at least a portion of a template and at least a portion of template complementary sequence.

TECHNICAL FIELD OF THE INVENTION

The invention generally relates to methods for increasing accuracy innucleic acid synthesis reactions.

BACKGROUND OF THE INVENTION

In vitro nucleic acid synthesis is a foundation of many fundamentalresearch and diagnostic tools, such as nucleic acid amplification andsequencing. In a template-dependent nucleic acid synthesis reaction, thesequential addition of nucleotides is catalyzed by a nucleic acidpolymerase. Depending on the template and the nature of the reaction,the nucleic acid polymerase may be a DNA polymerase, an RNA polymerase,or a reverse transcriptase.

The accuracy of template-dependent nucleic acid synthesis depends inpart on the ability of the polymerase to discriminate betweencomplementary and non-complementary nucleotides. Normally, theconformation of the polymerase enzyme favors incorporation of thecomplementary nucleotide. However, there is still an identifiable rateof misincorporation that depends upon factors such as local sequence andthe base to be incorporated.

Synthetic or modified nucleotides and analogs, such as labelednucleotides, tend to be incorporated into a primer less efficiently thannaturally-occurring nucleotides. The reduced efficiency with which theunconventional nucleotides are incorporated by the polymerase canadversely affect the performance of sequencing techniques that dependupon faithful incorporation of such unconventional nucleotides.

Single molecule sequencing techniques allow the evaluation of individualnucleic acid molecules in order to identify changes and/or differencesaffecting genomic function. In single molecule techniques, a nucleicacid fragment is attached to a solid support such that at least aportion of the nucleic acid fragment is individuallyoptically-resolvable. Sequencing is conducted using the fragments astemplates. Sequencing events are detected and correlated to theindividual strands. See Braslavsky et al., Proc. Natl. Acad. Sci., 100:3960-64 (2003), incorporated by reference herein. Because singlemolecule techniques do not rely on ensemble averaging as do bulktechniques, errors due to misincorporation can have a significantdeleterious effect on the sequencing results. The incorporation of anucleotide that is incorrectly paired, under standard Watson and Crickbase-pairing, with a corresponding template nucleotide during primerextension may result in sequencing errors. Furthermore, where thetemplate being sequenced is present in only one or a few copies in thesample (a rare template), misincorporations can have a great impact onthe sequence obtained because fewer sequences are obtained with which tocompare to each other or with a reference sequence.

There is, therefore, a need in the art for improved methods forimproving the accuracy of nucleic acid synthesis reactions, especiallyin single molecule sequencing.

SUMMARY OF THE INVENTION

The invention improves the accuracy of nucleic acid sequencingreactions. According to the invention, a template nucleic acid ishybridized to a primer and template-dependent sequencing-by-synthesis isconducted to extend the 3′ end of the primer. The template is thenremoved from the extended primer and the primer is then “primed” andre-sequenced. Practice of the invention allows resequencing of the samesequence and its complement in situ, and results in increased accuracyof sequence determination.

According to the invention, a polymerization reaction is conducted on anucleic acid duplex that comprises a primer hybridized to a templatenucleic acid. The reaction is conducted in the presence of a polymerase,and at least one nucleotide comprising a detectable label. If thenucleotide is complementary to the next nucleotide in the template, itis added to the primer by the polymerase. The added nucleotide isdetected and the reaction is then repeated at least once. Thus, theprimer is extended by one or more nucleotides corresponding to sequencethat is complementary to at least a portion of the template. Thetemplate is removed from the duplex, leaving the extended primer.

In one embodiment, one or more primer/template duplexes are bound to asolid support such that a least some of the duplexes are individuallyoptically detectable. The duplexes are exposed to a polymerase, and atleast one detectably-labeled nucleotide under conditions sufficient fortemplate-dependent nucleotide addition to the primer. Unincorporatedlabeled nucleotides are optionally washed away. The incorporation of thelabeled nucleotide is detected, thereby identifying the added nucleotideand the complementary template nucleotide. Base addition, washing, andidentification steps can be serially repeated in the presence ofdetectably labeled nucleotide that corresponds to each of the othernucleotide species. As a result, the primer is extended by the addednucleotides. The added nucleotides correspond to sequence that iscomplementary to at least a portion of the template.

After one or more primer extension steps, the template is removed fromthe duplex. The template can be removed by any suitable means, forexample by raising the temperature of the surface or the flow cell suchthat the duplex is melted, or by changing the buffer conditions todestabilize the duplex, or combination thereof. Methods for meltingtemplate/primer duplexes are well known in the art and are described,for example, in chapter 10 of Molecular Cloning, a Laboratory Manual,3^(rd) Edition, J. Sambrook, and D. W. Russell, Cold Spring Harbor Press(2001), the teachings of which are incorporated herein by reference. Thetemplate can then be removed from the surface, for example, by rinsingthe surface with a suitable rinsing solution.

After removing the template, the extended primer used in thepolymerization reaction remains on the surface. The 3′ terminus of theprimer is then modified by addition of a short polynucleotide. Thepolynucleotide is added to the primer by enzymatic catalysis. Apreferred enzyme is a ligase or a polymerase. Suitable ligases include,for example, T4 DNA ligase and T4 RNA ligase (such ligases are availablecommercially, from New England BioLabs (on the World Wide Web atNEB.com) and others capable of adding nucleotides to the 3′ terminus ofthe primer. In a preferred embodiment, a dephosphorylated polynucleotideis added to the primer. Methods for using ligases and dephosphorylatingoligonucleotides are well known in the art.

If polymerization is used to add polynucleotides to the 3′ terminus ofthe primer, any suitable enzyme can be used. For example, a polymerase,such as poly(A) polymerase, including yeast poly(A) polymerase,commercially available from USB (on the World Wide Web at USBweb.com),terminal deoxyribonucleotidyl transferase (TdT), and the like areuseful. The polymerases can be used according to the manufacturer'sinstructions.

Having been modified as described above, the primer is then used as atemplate for template-dependent sequencing-by-synthesis as describedgenerally above.

The polynucleotide added to the primer is chosen such that it iscomplementary to a new primer (or at least a portion thereof). In apreferred embodiment, the polynucleotide is a homopolymer, such asoligo(dA), and the corresponding primer includes an oligo(dT) sequence.The complementary sequences are of a length suitable for hybridization.The added polynucleotide and its complementary new primer can be about10 to about 100 nucleotides in length, and preferably about 50nucleotides in length. The added polynucleotide and new primer can be ofthe same length or of different lengths. It is routine in the art toadjust primer length and/or oligonucleotide length to optimizehybridization.

Once a polynucleotide is added to the 3′ end of the primer and a newprimer sequence is hybridized to the polynucleotide (or portionthereof), template-dependent sequencing-by-synthesis is conducted on theprimer in the opposite direction of the original sequencing reaction(i.e., toward to surface to which the primer is bound).

After conducting the sequencing reaction back toward to the surface, the“new” extended primer can be melted off, leaving a template having thecomplementary sequence as the original template for optionalresequencing in the 3′ to 5′ direction (i.e., toward the surface).

Sequencing and/or resequencing at least a portion of the complement ofthe original template increases the accuracy of the sequence informationobtained from a given template by providing more than one set ofsequence information to compare, for example, to a reference sequence.In another embodiment, the sequence initially obtained can be comparedto the sequence obtained from the new template.

Sequencing methods of the invention preferably comprise template/primerduplex attached to a surface. Individual nucleotides added to thesurface comprise a detectable label—preferably an optically-detectablelabel, such as a fluorescent label. Each nucleotide species can comprisea different label, or can comprise the same label. In a preferredembodiment, each duplex is individually optically resolvable in order tofacilitate single molecule sequence discrimination. The choice of asurface for attachment of duplex depends upon the detection methodemployed. Preferred surfaces for methods of the invention includeepoxide surfaces and polyelectrolyte multilayer surfaces, such as thosedescribed in Braslavsky, et al., supra. Surfaces preferably aredeposited on a substrate that is amenable to optical detection of thesurface chemistry, such as glass or silica.

Nucleotides useful in the invention include any nucleotide or nucleotideanalog, whether naturally-occurring or synthetic. For example, preferrednucleotides include phosphate esters of deoxyadenosine, deoxycytidine,deoxyguanosine, deoxythymidine, adenosine, cytidine, guanosine, anduridine.

Polymerases useful in the invention include any nucleic acid polymerasecapable of catalyzing a template-dependent addition of a nucleotide ornucleotide analog to a primer. Depending on the characteristics of thetarget nucleic acid, a DNA polymerase, an RNA polymerase, a reversetranscriptase, or a mutant or altered form of any of the foregoing canbe used. According to one aspect of the invention, a thermophilicpolymerase is used, such as ThermoSequenase®, 9°N™, Therminator™, Taq,Tne, Tma, Pfu, Tfl, Tth, Tli, Stoffel fragment, Vent™ and Deep Vent™ DNApolymerase.

BRIEF DESCRIPTION OF THE DRAWING

The FIGURE shows a schematic representation of one embodiment of thepresent invention.

DETAILED DESCRIPTION

The invention provides methods and compositions for improving theaccuracy of nucleic acid sequencing-by-synthesis reactions byre-sequencing a least a portion of a template nucleic acid. Whileapplicable to bulk sequencing methods, the invention is particularlyuseful in connection with single molecule sequencing methods. Accordingto the invention, the methods comprise the steps of exposing a duplexcomprising a template and a primer to a polymerase and one or morenucleotide comprising a detectable label under conditions sufficient fortemplate-dependent nucleotide addition to the primer. In one embodiment,the template is individually optically resolvable. Any unincorporatedlabeled nucleotide is optionally washed way. Any nucleotide incorporatedinto the primer is identified by detecting the label associated with theincorporated nucleotide. The steps of exposing duplex to polymerase andanother nucleotide comprising a detectable label and polymerizing,optional washing, and identification are repeated, thereby determining anucleotide sequence. As a result of the exposing and polymerizing steps,the primer is extended by the addition of one or more nucleotides thatare complementary to the corresponding positions of the template. Thetemplate is then removed from the duplex, leaving the extended primer. Apolynucleotide is added to the 3′ terminus of the primer or extendedprimer, thereby forming a modified primer. The modified primer is usedas the template in subsequent sequencing reactions.

The FIGURE is a schematic representation of one embodiment of theinvention. In this embodiment, a primer, 2, is attached to a solidsupport, 3. A template, 1, is hybridized to the primer, forming atemplate/primer duplex. In step A, the template primer/duplex is exposedto a polymerase and at least one nucleotide comprising a detectablelabel, under conditions sufficient for template-dependent nucleotideaddition to said primer. If the nucleotide is complementary to thetemplate nucleotide immediately downstream of the primer, a nucleotideis added to the primer. After identifying nucleotide incorporated intothe primer, the process is repeated, thereby adding a second nucleotideto the primer in a template dependent manner, and so on. As shown in theFIGURE, as a result of repeating the process, template complementarysequence, 4, is added to the primer. After the process has been repeatedthe desired number of times, the template is removed as shown in step B,leaving the extended primer. In step C, a polynucleotide, 6, is added tothe extended primer at the 3′ terminus (e.g., downstream of thepreviously added template complementary sequence) forming a new templatefor sequencing in the opposite direction. In step D, a primer capable ofhybridizing to the polynucleotide is added, forming a template/primerduplex, 7. The process of adding nucleotide and polymerase, detectingincorporated nucleotide and repeating the desired number of times isthen repeated using the modified extended primer as a template, as shownin step E, thereby sequencing the template complementary sequence 4, 8.As a result, the primer, 6, is extended by the addition of sequence 8that corresponds to at least a portion of the original template, 1. Theextended primer (6, 8) can be removed, F, and the template can beresequenced, as described above.

In a preferred embodiment of the invention, direct amine attachment isused to attach primer or template to an epoxide surface. The primer orthe template can comprise an optically-detectable label in order todetermine the location of duplex on the surface. At least a portion ofthe duplex is optically resolvable from other duplexes on the surface.The surface is preferably passivated with a reagent that occupiesportions of the surface that might, absent passivation, fluoresce.Optimal passivation reagents include amines, phosphate, water, sulfates,detergents, and other reagents that reduce native or accumulatingsurface fluorescence. Sequencing is then accomplished by presenting oneor more labeled nucleotide in the presence of a polymerase underconditions that promote complementary base incorporation in the primer.In a preferred embodiment, one base at a time (per cycle) is added andall bases have the same label. There is a wash step after eachincorporation cycle, and the label is either neutralized without removalor removed from incorporated nucleotides. After the completion of apredetermined number of cycles of base addition, the linear sequencedata for each individual duplex is compiled. Numerous algorithms areavailable for sequence compilation and alignment as discussed below.

In general, epoxide-coated glass surfaces are used for direct amineattachment of templates, primers, or both. Amine attachment to thetermini of template and primer molecules is accomplished using terminaltransferase. Primer molecules can be custom-synthesized to hybridize totemplates for duplex formation.

A full-cycle is conducted as many times as necessary to completesequencing of a desired length of template, or resequencing of thedesired length of the template complementary sequence. Once the desirednumber of cycles is complete, the result is a stack of imagesrepresented in a computer database. For each spot on the surface thatcontained an initial individual duplex, there will be a series of lightand dark image coordinates, corresponding to whether a base wasincorporated in any given cycle. For example, if the template sequencewas TACGTACG and nucleotides were presented in the order CAGU(T), thenthe duplex would be “dark” (i.e., no detectable signal) for the firstcycle (presentation of C), but would show signal in the second cycle(presentation of A, which is complementary to the first T in thetemplate sequence). The same duplex would produce signal uponpresentation of the G, as that nucleotide is complementary to the nextavailable base in the template, C. Upon the next cycle (presentation ofU), the duplex would be dark, as the next base in the template is G.Upon presentation of numerous cycles, the sequence of the template wouldbe built up through the image stack. The sequencing data are then fedinto an aligner as described below for resequencing, or are compiled forde novo sequencing as the linear order of nucleotides incorporated intothe primer.

The imaging system used in practice of the invention can be any systemthat provides sufficient illumination of the sequencing surface at amagnification such that single fluorescent molecules can be resolved.

General Considerations

A. Nucleic Acid Templates

Nucleic acid templates include deoxyribonucleic acid (DNA) and/orribonucleic acid (RNA). Nucleic acid template molecules can be isolatedfrom a biological sample containing a variety of other components, suchas proteins, lipids and non-template nucleic acids. Nucleic acidtemplate molecules can be obtained from any cellular material, obtainedfrom an animal, plant, bacterium, fungus, or any other cellularorganism. Biological samples for use in the invention also include viralparticles or samples prepared from viral material. Nucleic acid templatemolecules may be obtained directly from an organism or from a biologicalsample obtained from an organism, e.g., from blood, urine, cerebrospinalfluid, seminal fluid, saliva, sputum, stool and tissue. Any tissue orbody fluid specimen may be used as a source for nucleic acid for use inthe invention. Nucleic acid template molecules may also be isolated fromcultured cells, such as a primary cell culture or a cell line. The cellsor tissues from which template nucleic acids are obtained can beinfected with a virus or other intracellular pathogen. A sample can alsobe total RNA extracted from a biological specimen, a cDNA library,viral, or genomic DNA.

Nucleic acid obtained from biological samples typically is fragmented toproduce suitable fragments for analysis. In one embodiment, nucleic acidfrom a biological sample is fragmented by sonication. Nucleic acidtemplate molecules can be obtained as described in U.S. PatentApplication 2002/0190663 A1, published Oct. 9, 2003, the teachings ofwhich are incorporated herein in their entirety. Generally, nucleic acidcan be extracted from a biological sample by a variety of techniquessuch as those described by Maniatis, et al., Molecular Cloning: ALaboratory Manual, Cold Spring Harbor, N.Y., pp. 280-281 (1982).Generally, individual nucleic acid template molecules can be from about5 bases to about 20 kb. Nucleic acid molecules may be single-stranded,double-stranded, or double-stranded with single-stranded regions (forexample, stem- and loop-structures).

A biological sample as described herein may be homogenized orfractionated in the presence of a detergent or surfactant. Theconcentration of the detergent in the buffer may be about 0.05% to about10.0%. The concentration of the detergent can be up to an amount wherethe detergent remains soluble in the solution. In a preferredembodiment, the concentration of the detergent is between 0.1% to about2%. The detergent, particularly a mild one that is nondenaturing, canact to solubilize the sample. Detergents may be ionic or nonionic.Examples of nonionic detergents include triton, such as the Triton® Xseries (Triton® X-100 t-Oct-C₆H₄—(OCH₂—CH₂)_(x)OH, x=9-10, Triton®X-100R, Triton® X-114 x=7-8), octyl glucoside, polyoxyethylene(9)dodecylether, digitonin, IGEPAL® CA630 octylphenyl polyethylene glycol,n-octyl-beta-D-glucopyranoside (betaOG), n-dodecyl-beta, Tween® 20polyethylene glycol sorbitan monolaurate, Tween® 80 polyethylene glycolsorbitan monooleate, polidocanol, n-dodecyl beta-D-maltoside (DDM),NP-40 nonylphenyl polyethylene glycol, C12E8 (octaethylene glycoln-dodecyl monoether), hexaethyleneglycol mono-n-tetradecyl ether(C14EO6), octyl-beta-thioglucopyranoside (octyl thioglucoside, OTG),Emulgen, and polyoxyethylene 10 lauryl ether (C12E10). Examples of ionicdetergents (anionic or cationic) include deoxycholate, sodium dodecylsulfate (SDS), N-lauroylsarcosine, and cetyltrimethylammoniumbromide(CTAB). A zwitterionic reagent may also be used in the purificationschemes of the present invention, such as Chaps, zwitterion 3-14, and3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulf-onate. It iscontemplated also that urea may be added with or without anotherdetergent or surfactant.

Lysis or homogenization solutions may further contain other agents, suchas reducing agents. Examples of such reducing agents includedithiothreitol (DTT), β-mercaptoethanol, DTE, GSH, cysteine, cysteamine,tricarboxyethyl phosphine (TCEP), or salts of sulfurous acid.

B. Nucleotides

Nucleotides useful in the invention include any nucleotide or nucleotideanalog, whether naturally-occurring or synthetic. For example, preferrednucleotides include phosphate esters of deoxyadenosine, deoxycytidine,deoxyguanosine, deoxythymidine, adeno sine, cytidine, guano sine, anduridine. Other nucleotides useful in the invention comprise an adenine,cytosine, guanine, thymine base, a xanthine or hypoxanthine;5-bromouracil, 2-aminopurine, deoxyinosine, or methylated cytosine, suchas 5-methylcytosine, and N4-methoxydeoxycytosine. Also included arebases of polynucleotide mimetics, such as methylated nucleic acids,e.g., 2′-O-methRNA, peptide nucleic acids, modified peptide nucleicacids, locked nucleic acids and any other structural moiety that can actsubstantially like a nucleotide or base, for example, by exhibitingbase-complementarity with one or more bases that occur in DNA or RNAand/or being capable of base-complementary incorporation, and includeschain-terminating analogs. A nucleotide corresponds to a specificnucleotide species if they share base-complementarity with respect to atleast one base.

Nucleotides for nucleic acid sequencing according to the inventionpreferably comprise a detectable label that is directly or indirectlydetectable. Preferred labels include optically-detectable labels, suchas fluorescent labels. Examples of fluorescent labels include, but arenot limited to, 4-acetamido-4′-isothiocyanatostilbene-2,2′disulfonicacid; acridine and derivatives: acridine, acridine isothiocyanate;5-(2′-aminoethyl)aminonaphthalene-1-sulfonic acid (EDANS);4-amino-N-[3-vinylsulfonyl)phenyl]naphthalimide-3,5 disulfonate;N-(4-anilino-1-naphthyl)maleimide; anthranilamide; BODIPY; BrilliantYellow; coumarin and derivatives; coumarin, 7-amino-4-methylcoumarin(AMC, Coumarin 120), 7-amino-4-trifluoromethylcouluarin (Coumaran 151);cyanine dyes; cyanosine; 4′,6-diaminidino-2-phenylindole (DAPI); 5′ 5dibromopyrogallol-sulfonaphthalein (Bromopyrogallol Red);7-diethylamino-3-(4′-isothiocyanatophenyl)-4-methylcoumarin;diethylenetriamine pentaacetate;4,4′-diisothiocyanatodihydro-stilbene-2,2′-disulfonic acid;4,4′-diisothiocyanatostilbene-2,2′-disulfonic acid;5-[dimethylamino]naphthalene-1-sulfonyl chloride (DNS, dansylchloride);4-dimethylaminophenylazophenyl-4′-isothiocyanate (DABITC); eosin andderivatives; eosin, eosin isothiocyanate, erythrosin and derivatives;erythrosin B, erythrosin, isothiocyanate; ethidium; fluorescein andderivatives; 5-carboxyfluorescein (FAM),5-(4,6-dichlorotriazin-2-yl)aminofluorescein (DTAF),2′,7′-dimethoxy-4′5′-dichloro-6-carboxyfluorescein, fluorescein,fluorescein isothiocyanate, QFITC, (XRITC); fluorescamine; IR144;IR1446; Malachite Green isothiocyanate; 4-methylumbelliferoneorthocresolphthalein; nitrotyrosine; pararosaniline; Phenol Red;B-phycoerythrin; o-phthaldialdehyde; pyrene and derivatives: pyrene,pyrene butyrate, succinimidyl 1-pyrene; butyrate quantum dots; ReactiveRed 4 (Cibacron™ Brilliant Red 3B-A) rhodamine and derivatives:6-carboxy-X-rhodamine (ROX), 6-carboxyrhodamine (R6G), lissaminerhodamine B sulfonyl chloride rhodamine (Rhod), rhodamine B, rhodamine123, rhodamine X isothiocyanate, sulforhodamine B, sulforhodamine 101,sulfonyl chloride derivative of sulforhodamine 101 (Texas Red);N,N,N′,N′tetramethyl-6-carboxyrhodamine (TAMRA); tetramethyl rhodamine;tetramethyl rhodamine isothiocyanate (TRITC); riboflavin; rosolic acid;terbium chelate derivatives; Cy3; Cy5; Cy5.5; Cy7; IRD 700; IRD 800; LaJolta Blue; phthalo cyanine; and naphthalo cyanine. Preferredfluorescent labels are cyanine-3 and cyanine-5. Labels other thanfluorescent labels are contemplated by the invention, including otheroptically-detectable labels.

C. Nucleic Acid Polymerases

Nucleic acid polymerases generally useful in the invention include DNApolymerases, RNA polymerases, reverse transcriptases, and mutant oraltered forms of any of the foregoing. DNA polymerases and theirproperties are described in detail in, among other places, DNAReplication 2nd edition, Komberg and Baker, W. H. Freeman, New York,N.Y. (1991). Known conventional DNA polymerases useful in the inventioninclude, but are not limited to, Pyrococcus furiosus (Pfu) DNApolymerase (Lundberg et al., 1991, Gene, 108: 1, Stratagene), Pyrococcuswoesei (Pwo) DNA polymerase (Hinnisdaels et al., 1996, Biotechniques,20:186-8, Boehringer Mannheim), Thermus thermophilus (Tth) DNApolymerase (Myers and Gelfand 1991, Biochemistry 30:7661), Bacillusstearothermophilus DNA polymerase (Stenesh and McGowan, 1977, BiochimBiophys Acta 475:32), Thermococcus litoralis (Tli) DNA polymerase (alsoreferred to as Vent™ DNA polymerase, Cariello et al., 1991,Polynucleotides Res, 19: 4193, New England Biolabs), 9°Nm™ DNApolymerase (New England Biolabs), Stoffel fragment, ThermoSequenase®(Amersham Pharmacia Biotech UK), Therminator™ (New England Biolabs),Thermotoga maritima (Tma) DNA polymerase (Diaz and Sabino, 1998 Braz JMed. Res, 31:1239), Thermus aquaticus (Taq) DNA polymerase (Chien etal., 1976, J. Bacteoriol, 127: 1550), DNA polymerase, Pyrococcuskodakaraensis KOD DNA polymerase (Takagi et al., 1997, Appl. Environ.Microbiol. 63:4504), JDF-3 DNA polymerase (from thermococcus sp. JDF-3,Patent application WO 0132887), Pyrococcus GB-D (PGB-D) DNA polymerase(also referred as Deep Vent™ DNA polymerase, Juncosa-Ginesta et al.,1994, Biotechniques, 16:820, New England Biolabs), UlTma DNA polymerase(from thermophile Thermotoga maritima; Diaz and Sabino, 1998 Braz J.Med. Res, 31:1239; PE Applied Biosystems), Tgo DNA polymerase (fromthermococcus gorgonarius, Roche Molecular Biochemicals), E. coli DNApolymerase I (Lecomte and Doubleday, 1983, Polynucleotides Res.11:7505), T7 DNA polymerase (Nordstrom et al., 1981, J Biol. Chem.256:3112), and archaeal DP1I/DP2 DNA polymerase II (Cann et al., 1998,Proc Natl Acad. Sci. USA 95:14250-->5).

While mesophilic polymerases are contemplated by the invention,preferred polymerases are thermophilic. Thermophilic DNA polymerasesinclude, but are not limited to, ThermoSequenase®, 9°N™, Therminator™,Taq, Tne, Tma, Pfu, Tfl, Tth, Tli, Stoffel fragment, Vent™ and DeepVent™ DNA polymerase, KOD DNA polymerase, Tgo, JDF-3, and mutants,variants and derivatives thereof.

Reverse transcriptases useful in the invention include, but are notlimited to, reverse transcriptases from HIV, HTLV-1, HTLV-II, FeLV, FIV,SIV, AMV, MMTV, MoMuLV and other retroviruses (see Levin, Cell 88:5-8(1997); Verma, Biochim Biophys Acta. 473:1-38 (1977); Wu et al., CRCCrit Rev Biochem. 3:289-347 (1975)).

D. Surfaces

In a preferred embodiment, nucleic acid template molecules are attachedto a substrate (also referred to herein as a surface) and subjected toanalysis by sequencing as taught herein. Nucleic acid template moleculesare attached to the surface such that the template/primer duplexes areindividually optically resolvable. Substrates for use in the inventioncan be two- or three-dimensional and can comprise a planar surface(e.g., a glass slide) or can be shaped. A substrate can include glass(e.g., controlled pore glass (CPG)), quartz, plastic (such aspolystyrene (low cross-linked and high cross-linked polystyrene),polycarbonate, polypropylene and poly(methymethacrylate)), acryliccopolymer, polyamide, silicon, metal (e.g., alkanethiolate-derivatizedgold), cellulose, nylon, latex, dextran, gel matrix (e.g., silica gel),polyacrolein, or composites.

Suitable three-dimensional substrates include, for example, spheres,microparticles, beads, membranes, slides, plates, micromachined chips,tubes (e.g., capillary tubes), microwells, microfluidic devices,channels, filters, or any other structure suitable for anchoring anucleic acid. Substrates can include planar arrays or matrices capableof having regions that include populations of template nucleic acids orprimers. Examples include nucleoside-derivatized CPG and polystyreneslides; derivatized magnetic slides; polystyrene grafted withpolyethylene glycol, and the like.

In one embodiment, a substrate is coated to allow optimum opticalprocessing and nucleic acid attachment. Substrates for use in theinvention can also be treated to reduce background. Exemplary coatingsinclude epoxides, and derivatized epoxides (e.g., with a bindingmolecule, such as streptavidin). The surface can also be treated toimprove the positioning of attached nucleic acids (e.g., nucleic acidtemplate molecules, primers, or template molecule/primer duplexes) foranalysis. As such, a surface according to the invention can be treatedwith one or more charge layers (e.g., a negative charge) to repel acharged molecule (e.g., a negatively charged labeled nucleotide). Forexample, a substrate according to the invention can be treated withpolyallylamine followed by polyacrylic acid to form a polyelectrolytemultilayer. The carboxyl groups of the polyacrylic acid layer arenegatively charged and thus repel negatively charged labelednucleotides, improving the positioning of the label for detection.Coatings or films applied to the substrate should be able to withstandsubsequent treatment steps (e.g., photoexposure, boiling, baking,soaking in warm detergent-containing liquids, and the like) withoutsubstantial degradation or disassociation from the substrate.

Examples of substrate coatings include, vapor phase coatings of3-aminopropyltrimethoxysilane, as applied to glass slide products, forexample, from Molecular Dynamics, Sunnyvale, Calif. In addition,generally, hydrophobic substrate coatings and films aid in the uniformdistribution of hydrophilic molecules on the substrate surfaces.Importantly, in those embodiments of the invention that employ substratecoatings or films, the coatings or films that are substantiallynon-interfering with primer extension and detection steps are preferred.Additionally, it is preferable that any coatings or films applied to thesubstrates either increase template molecule binding to the substrateor, at least, do not substantially impair template binding.

Various methods can be used to anchor or immobilize the primer to thesurface of the substrate. The immobilization can be achieved throughdirect or indirect bonding to the surface. The bonding can be bycovalent linkage. See, Joos et al., Analytical Biochemistry 247:96-101,1997; Oroskar et al., Clin. Chem. 42:1547-1555, 1996; and Khandjian,Mol. Bio. Rep. 11:107-115, 1986. A preferred attachment is direct aminebonding of a terminal nucleotide of the template or the primer to anepoxide integrated on the surface. The bonding also can be throughnon-covalent linkage. For example, biotin-streptavidin (Taylor et al.,J. Phys. D. Appl. Phys. 24:1443, 1991) and digoxigenin withanti-digoxigenin (Smith et al., Science 253:1122, 1992) are common toolsfor anchoring nucleic acids to surfaces and parallels. Alternatively,the attachment can be achieved by anchoring a hydrophobic chain into alipid monolayer or bilayer. Other methods for known in the art forattaching nucleic acid molecules to substrates also can be used.

E. Detection

Any detection method may be used that is suitable for the type of labelemployed. Thus, exemplary detection methods include radioactivedetection, optical absorbance detection, e.g., UV-visible absorbancedetection, optical emission detection, e.g., fluorescence orchemiluminescence. For example, extended primers can be detected on asubstrate by scanning all or portions of each substrate simultaneouslyor serially, depending on the scanning method used. For fluorescencelabeling, selected regions on a substrate may be serially scannedone-by-one or row-by-row using a fluorescence microscope apparatus, suchas described in Fodor (U.S. Pat. No. 5,445,934) and Mathies et al. (U.S.Pat. No. 5,091,652). Devices capable of sensing fluorescence from asingle molecule include scanning tunneling microscope (siM) and theatomic force microscope (AFM). Hybridization patterns may also bescanned using a CCD camera (e.g., Model TE/CCD512SF, PrincetonInstruments, Trenton, N.J.) with suitable optics (Ploem, in Fluorescentand Luminescent Probes for Biological Activity Mason, T. G. Ed.,Academic Press, Landon, pp. 1-11 (1993), such as described in Yershov etal., Proc. Natl. Aca. Sci. 93:4913 (1996), or may be imaged by TVmonitoring. For radioactive signals, a phosphorimager device can be used(Johnston et al., Electrophoresis, 13:566, 1990; Drmanac et al.,Electrophoresis, 13:566, 1992; 1993). Other commercial suppliers ofimaging instruments include General Scanning Inc., (Watertown, Mass. onthe World Wide Web at genscan.com), Genix Technologies (Waterloo,Ontario, Canada; on the World Wide Web at confocal.com), and AppliedPrecision Inc. Such detection methods are particularly useful to achievesimultaneous scanning of multiple attached template nucleic acids.

A number of approaches can be used to detect incorporation offluorescently-labeled nucleotides into a single nucleic acid molecule.Optical setups include near-field scanning microscopy, far-fieldconfocal microscopy, wide-field epi-illumination, light scattering, darkfield microscopy, photoconversion, single and/or multiphoton excitation,spectral wavelength discrimination, fluorophore identification,evanescent wave illumination, and total internal reflection fluorescence(TIRF) microscopy. In general, certain methods involve detection oflaser-activated fluorescence using a microscope equipped with a camera.Suitable photon detection systems include, but are not limited to,photodiodes and intensified CCD cameras. For example, an intensifiedcharge couple device (ICCD) camera can be used. The use of an ICCDcamera to image individual fluorescent dye molecules in a fluid near asurface provides numerous advantages. For example, with an ICCD opticalsetup, it is possible to acquire a sequence of images (movies) offluorophores.

Some embodiments of the present invention use TIRF microscopy fortwo-dimensional imaging. TIRF microscopy uses totally internallyreflected excitation light and is well known in the art. See, e.g., theWorld Wide Web at nikon-instrumentsjp/eng/page/products/tirf.aspx. Incertain embodiments, detection is carried out using evanescent waveillumination and total internal reflection fluorescence microscopy. Anevanescent light field can be set up at the surface, for example, toimage fluorescently-labeled nucleic acid molecules. When a laser beam istotally reflected at the interface between a liquid and a solidsubstrate (e.g., a glass), the excitation light beam penetrates only ashort distance into the liquid. The optical field does not end abruptlyat the reflective interface, but its intensity falls off exponentiallywith distance. This surface electromagnetic field, called the“evanescent wave”, can selectively excite fluorescent molecules in theliquid near the interface. The thin evanescent optical field at theinterface provides low background and facilitates the detection ofsingle molecules with high signal-to-noise ratio at visible wavelengths.

The evanescent field also can image fluorescently-labeled nucleotidesupon their incorporation into the attached template/primer complex inthe presence of a polymerase. Total internal reflectance fluorescencemicroscopy is then used to visualize the attached template/primer duplexand/or the incorporated nucleotides with single molecule resolution.

F. Analysis

Alignment and/or compilation of sequence results obtained from the imagestacks produced as generally described above utilizes look-up tablesthat take into account possible sequences changes (due, e.g., to errors,mutations, etc.). Essentially, sequencing results obtained as describedherein are compared to a look-up type table that contains all possiblereference sequences plus 1 or 2 base errors.

In resequencing, a preferred embodiment for sequence alignment comparessequences obtained to a database of reference sequences of the samelength, or within 1 or 2 bases of the same length, from the initiallyobtained sequence or the target sequence contained in a look-up tableformat. In a preferred embodiment, the look-up table contains exactmatches with respect to the reference sequence and sequences of theprescribed length or lengths that have one or two errors (e.g., 9-merswith all possible 1-base or 2-base errors). The obtained sequences arethen matched to the sequences on the look-up table and given a scorethat reflects the uniqueness of the match to sequence(s) in the table.The obtained sequences are then aligned to the reference sequence basedupon the position at which the obtained sequence best matches a portionof the reference sequence. More detail on the alignment process isprovided below in the Example.

EXAMPLE

The 7249 nucleotide genome of the bacteriophage M13mp18 was sequencedusing single molecule methods of the invention. Purified,single-stranded viral M13mp18 genomic DNA was obtained from New EnglandBiolabs. Approximately 25 ug of M13 DNA was digested to an averagefragment size of 40 bp with 0.1 U Dnase I (New England Biolabs) for 10minutes at 37° C. Digested DNA fragment sizes were estimated by runningan aliquot of the digestion mixture on a precast denaturing (TBE-Urea)10% polyacrylamide gel (Novagen) and staining with SYBR Gold(Invitrogen/Molecular Probes). The DNase I-digested genomic DNA wasfiltered through a YM10 ultrafiltration spin column (Millipore) toremove small digestion products less than about 30 nt. Approximately 20pmol of the filtered DNase I digest was then polyadenylated withterminal transferase according to known methods (Roychoudhury, R and Wu,R. 1980, Terminal transferase-catalyzed addition of nucleotides to the3′ termini of DNA. Methods Enzymol. 65(1):43-62.). The average dA taillength was 50+/−5 nucleotides. Terminal transferase was then used tolabel the fragments with Cy3-dUTP. Fragments were then terminated withdideoxyTTP (also added using terminal transferase). The resultingfragments were again filtered with a YM10 ultrafiltration spin column toremove free nucleotides and stored in ddH₂O at −20° C.

Epoxide-coated glass slides were prepared for oligo attachment.Epoxide-functionalized 40 mm diameter #1.5 glass cover slips (slides)were obtained from Erie Scientific (Salem, N.H.). The slides werepreconditioned by soaking in 3×SSC for 15 minutes at 37° C. Next, a 500pM aliquot of 5′ aminated polydT(50) primer (polythymidine of 50nucleotides in length with a 5′ terminal amine) is incubated with eachslide for 30 minutes at room temperature in a volume of 80 ml. Theresulting slides have primer attached by direct amine linkage to theepoxide. The slides are then treated with phosphate (1 M) for 4 hours atroom temperature in order to passivate the surface. Slides re thenstored in polymerase rinse buffer (20 mM Tris, 100 mM NaCl, 0.001%Triton X-100, pH 8.0) until they are used for sequencing.

For sequencing, the slides are placed in a modified FCS2 flow cell(Bioptechs, Butler, Pa.) using a 50 um thick gasket The flow cell isplaced on a movable stage that is part of a high-efficiency fluorescenceimaging system built around a Nikon TE-2000 inverted microscope equippedwith a total internal reflection (TIR) objective. The slide is thenrinsed with HEPES buffer with 100 mM NaCl and equilibrated to atemperature of 50° C. An aliquot of poly(dT50) template is placed in theflow cell and incubated on the slide for 15 minutes. After incubation,the flow cell is rinsed with 1×SSC/HEPES/0.1% SDS followed byHEPES/NaCl. A passive vacuum apparatus is used to pull fluid across theflow cell. The resulting slide contains M13 template/primer duplex. Thetemperature of the flow cell is then reduced to 37° C. for sequencingand the objective is brought into contact with the flow cell.

For sequencing, cytosine triphosphate, guanidine triphosphate, adeninetriphosphate, and uracil triphosphate, each having a cyanine-5 label (atthe 7-deaza position for ATP and GTP and at the C5 position for CTP andUTP (PerkinElmer)) are stored separately in buffer containing 20 mMTris-HCl, pH 8.8, 10 mM MgSO₄, 10 mM (NH₄)₂SO₄, 10 mM HCl, and 0.1%Triton X-100, and 100 U Klenow exo⁻ polymerase (NEN). Sequencingproceeds as follows.

First, initial imaging is used to determine the positions of duplex onthe epoxide surface. The Cy3 label attached to the M13 templates isimaged by excitation using a laser tuned to 532 nm radiation (Verdi V-2Laser, Coherent, Inc., Santa Clara, Calif.) in order to establish duplexposition. For each slide only single fluorescent molecules imaged inthis step are counted. Imaging of incorporated nucleotides as describedbelow is accomplished by excitation of a cyanine-5 dye using a 635 nmradiation laser (Coherent). 5 uM Cy5CTP is placed into the flow cell andexposed to the slide for 2 minutes. After incubation, the slide isrinsed in 1×SSC/15 mM HEPES/0.1% SDS/pH 7.0 (“SSC/HEPES/SDS”) (15 timesin 60 ul volumes each, followed by 150 mM HEPES/150 mM NaCl/pH 7.0(“HEPES/NaCl”) (10 times at 60 ul volumes). An oxygen scavengercontaining 30% acetonitrile and scavenger buffer (134 ul HEPES/NaCl, 24ul 100 mM Trolox in MES, pH6.1, 10 ul DABCO in MES, pH6.1, 8 ul 2Mglucose, 20 ul NaI (50 mM stock in water), and 4 ul glucose oxidase) isnext added. The slide is then imaged (500 frames) for 0.2 seconds usingan Inova301K laser (Coherent) at 647 nm, followed by green imaging witha Verdi V-2 laser (Coherent) at 532 nm for 2 seconds to confirm duplexposition. The positions having detectable fluorescence are recorded.After imaging, the flow cell is rinsed 5 times each with SSC/HEPES/SDS(60 ul) and HEPES/NaCl (60 ul). Next, the cyanine-5 label is cleaved offincorporated CTP by introduction into the flow cell of 50 mM TCEP for 5minutes, after which the flow cell is rinsed 5 times each withSSC/HEPES/SDS (60 ul) and HEPES/NaCl (60 ul). The remaining nucleotideis capped with 50 mM iodoacetamide for 5 minutes followed by rinsing 5times each with SSC/HEPES/SDS (60 ul) and HEPES/NaCl (60 ul). Thescavenger is applied again in the manner described above, and the slideis again imaged to determine the effectiveness of the cleave/cap stepsand to identify non-incorporated fluorescent objects.

The procedure described above is then conducted 100 nM Cy5dATP, followedby 100 nM Cy5dGTP, and finally 500 nM Cy5dUTP. The procedure (expose tonucleotide, polymerase, rinse, scavenger, image, rinse, cleave, rinse,cap, rinse, scavenger, final image) is repeated exactly as described forATP, GTP, and UTP except that Cy5dUTP is incubated for 5 minutes insteadof 2 minutes. Uridine is used instead of Thymidine due to the fact thatthe Cy5 label is incorporated at the position normally occupied by themethyl group in Thymidine triphosphate, thus turning the dTTP into dUTP.In all 64 cycles (C, A, G, U) are conducted as described in this and thepreceding paragraph.

Once the desired number of cycles are completed, the image stack data(i.e., the single molecule sequences obtained from the varioussurface-bound duplex) are aligned to the M13 reference sequence. Theimage data obtained can be compressed to collapse homopolymeric regions.Thus, the sequence “TCAAAGC” is represented as “TCAGC” in the data tagsused for alignment. Similarly, homopolymeric regions in the referencesequence are collapsed for alignment.

The alignment algorithm matches sequences obtained as described abovewith the actual M13 linear sequence. Placement of obtained sequence onM13 is based upon the best match between the obtained sequence and aportion of M13 of the same length, taking into consideration 0, 1, or 2possible errors. All obtained 9-mers with 0 errors (meaning that theyexactly match a 9-mer in the M13 reference sequence) are first alignedwith M13. Then 10-, 11-, and 12-mers with 0 or 1 error are aligned.Finally, all 13-mers or greater with 0, 1, or 2 errors are aligned.

The template fragments are removed by increasing the temperature of theflow cell above the melting temperature of the duplex, thereby releasingthe template fragments from the duplexes. The free templates are removedfrom the flow cell by washing the flow cell, for example the flow cellcan be rinsed 5 times each with SSC/HEPES/SDS (60 ul) and HEPES/NaCl (60ul).

The primers are then modified by adding a polynucleotide sequence to the3′ terminus of the primer. The oligonucleotide-modified primers are thenused as the template in subsequent polymerization reactions. Free primercapable of hybridizing to the added oligonucleotide is added to the flowcell and incubated under conditions sufficient to allow hybridizationbetween the added oligonucleotide portion of the template and the freeprimer. After incubation, the flow cell is rinsed with 1×SSC/HEPES/0.1%SDS followed by HEPES/NaCl. The resulting slide contains template/primerduplexes where the template comprises the original primer having M13template complementary sequences added thereto and modified with anoligonucleotide. The temperature of the flow cell is then reduced to 37°C. for sequencing and the objective is brought into contact with theflow cell. The procedure (expose to nucleotide, polymerase, rinse,scavenger, image, rinse, cleave, rinse, cap, rinse, scavenger, finalimage) is repeated as described above.

Once the desired number of cycles is completed, the image stack data(i.e., the single molecule sequences obtained from the varioussurface-bound duplex) are aligned to the M13 reference sequence and/orare aligned to the sequence initially obtained as described above. Theimage data obtained can be compressed to collapse homopolymeric regionsas described above.

The invention may be embodied in other specific forms without departingfrom the spirit or essential characteristics thereof. The foregoingembodiments are therefore to be considered in all respects illustrativerather than limiting on the invention described herein. Scope of theinvention is thus indicated by the appended claims rather than by theforegoing description, and all changes which come within the meaning andrange of equivalency of the claims are therefore intended to be embracedtherein.

1. A method of increasing accuracy of nucleic acid sequencing, themethod comprising the steps of: a) exposing a duplex comprising atemplate and a primer to a polymerase and one or more nucleotidecomprising a detectable label under conditions sufficient fortemplate-dependent nucleotide addition to said primer, wherein saidtemplate is individually optically resolvable; b) identifying nucleotideincorporated into said primer; c) repeating steps a) and b), therebydetermining a nucleotide sequence; d) removing the template from theprimer of step c); e) adding a polynucleotide to a 3′ terminus of theprimer of step d) to form a template; f) exposing the template of stepe) to a primer capable of hybridizing to said added polynucleotide toform template/primer duplex, and repeating steps a) through c) tosequence a portion of the template, wherein a least a portion of thesequence obtained is complementary to the nucleotide sequence of c),thereby increasing the accuracy of nucleic acid sequencing.
 2. Themethod of claim 1, wherein the sequence obtained in c) is compared witha complement of the sequence obtained in f).
 3. The method of claim 1,further comprising repeating steps d) and f).
 4. The method of claim 1,wherein said label is an optically-detectable label.
 5. The method ofclaim 4, wherein said optically-detectable label is a fluorescent label.6. The method of claim 5, wherein said fluorescent label is selectedfrom the group consisting of fluorescein, rhodamine, cyanine, Cy5, Cy3,BODIPY, alexa, and derivatives thereof.
 7. The method of claim 1,wherein said duplex is attached to a surface.
 8. The method of claim 1,wherein said primer is attached to a surface.